Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
نویسندگان
چکیده
BACKGROUND Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin's Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. METHODS In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. RESULTS This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. CONCLUSIONS It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.
منابع مشابه
Prediction of mental disorders after Mild Traumatic Brain Injury: principle component Approach
Introduction: In Processes Modeling, when there is relatively a high correlation between covariates, multicollinearity is created, and it leads to reduction in model's efficiency. In this study, by using principle component analysis, modification of the effect of multicolinearity in Artificial Neural Network (ANN) and Logistic Regression (LR) has been studied. Also, the effect of multicolineari...
متن کاملCombined subject table of contents
Statistics ANOVA and related Multiple imputation Basic statistics Multivariate analysis of variance and Binary outcomes related techniques Categorical outcomes Nonlinear regression Censored and truncated regression models Nonparametric statistics Cluster analysis Ordinal outcomes Correspondence analysis Other statistics Count outcomes Pharmacokinetic statistics Discriminant analysis Power and s...
متن کاملMissing Binary Covariate Data and Imputation in Regression Models
This paper presents a simple way to handle missing values in categorical covariates, namely conditional probability imputation . Properties of this technique are given for various patterns of missing data in regression studies . An example shows its use in the proportional hazards model . The probability imputation technique is furthermore compared with multiple imputation and model-based appro...
متن کاملبهکارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر همخطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان
Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...
متن کاملFactors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis
Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...
متن کامل